To achieve accurate and low-cost 3D object detection, existing methods propose to benefit camera-based multi-view detectors with spatial cues provided by the LiDAR modality, e.g., dense depth supervision and bird-eye-view (BEV) feature distillation. However, they directly conduct point-to-point mimicking from LiDAR to camera, which neglects the inner-geometry of foreground targets and suffers from the modal gap between 2D-3D features. In this paper, we propose the learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors for both dense depth and BEV features, termed as TiG-BEV. First, we introduce an inner-depth supervision module to learn the low-level relative depth relations between different foreground pixels. This enables the camera-based detector to better understand the object-wise spatial structures. Second, we design an inner-feature BEV distillation module to imitate the high-level semantics of different keypoints within foreground targets. To further alleviate the BEV feature gap between two modalities, we adopt both inter-channel and inter-keypoint distillation for feature-similarity modeling. With our target inner-geometry distillation, TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set. Code will be available at https://github.com/ADLab3Ds/TiG-BEV.
translated by 谷歌翻译
具有多传感器的3D对象检测对于自主驾驶和机器人技术的准确可靠感知系统至关重要。现有的3D探测器通过采用两阶段范式来显着提高准确性,这仅依靠激光点云进行3D提案的细化。尽管令人印象深刻,但点云的稀疏性,尤其是对于遥远的点,使得仅激光雷达的完善模块难以准确识别和定位对象。要解决这个问题,我们提出了一种新颖的多模式两阶段方法FusionRcnn,有效,有效地融合了感兴趣区域(ROI)的点云和摄像头图像。 FusionRcnn自适应地整合了LiDAR的稀疏几何信息和统一注意机制中相机的密集纹理信息。具体而言,它首先利用RoiPooling获得具有统一大小的图像集,并通过在ROI提取步骤中的建议中采样原始点来获取点设置;然后利用模式内的自我注意力来增强域特异性特征,此后通过精心设计的跨注意事项融合了来自两种模态的信息。FusionRCNN从根本上是插件,并支持不同的单阶段方法与不同的单阶段方法。几乎没有建筑变化。对Kitti和Waymo基准测试的广泛实验表明,我们的方法显着提高了流行探测器的性能。可取,FusionRCNN在Waymo上的FusionRCNN显着提高了强大的第二基线,而Waymo上的MAP则超过6.14%,并且优于竞争两阶段方法的表现。代码将很快在https://github.com/xxlbigbrother/fusion-rcnn上发布。
translated by 谷歌翻译
Background. Functional assessment of right ventricle (RV) using gated myocardial perfusion single-photon emission computed tomography (MPS) heavily relies on the precise extraction of right ventricular contours. In this paper, we present a new deep-learning-based model integrating both the spatial and temporal features in gated MPS images to perform the segmentation of the RV epicardium and endocardium. Methods. By integrating the spatial features from each cardiac frame of the gated MPS and the temporal features from the sequential cardiac frames of the gated MPS, we developed a Spatial-Temporal V-Net (ST-VNet) for automatic extraction of RV endocardial and epicardial contours. In the ST-VNet, a V-Net is employed to hierarchically extract spatial features, and convolutional long-term short-term memory (ConvLSTM) units are added to the skip-connection pathway to extract the temporal features. The input of the ST-VNet is ECG-gated sequential frames of the MPS images and the output is the probability map of the epicardial or endocardial masks. A Dice similarity coefficient (DSC) loss which penalizes the discrepancy between the model prediction and the ground truth was adopted to optimize the segmentation model. Results. Our segmentation model was trained and validated on a retrospective dataset with 45 subjects, and the cardiac cycle of each subject was divided into 8 gates. The proposed ST-VNet achieved a DSC of 0.8914 and 0.8157 for the RV epicardium and endocardium segmentation, respectively. The mean absolute error, the mean squared error, and the Pearson correlation coefficient of the RV ejection fraction (RVEF) between the ground truth and the model prediction were 0.0609, 0.0830, and 0.6985. Conclusion. Our proposed ST-VNet is an effective model for RV segmentation. It has great promise for clinical use in RV functional assessment.
translated by 谷歌翻译
The fifth generation of the Radio Access Network (RAN) has brought new services, technologies, and paradigms with the corresponding societal benefits. However, the energy consumption of 5G networks is today a concern. In recent years, the design of new methods for decreasing the RAN power consumption has attracted interest from both the research community and standardization bodies, and many energy savings solutions have been proposed. However, there is still a need to understand the power consumption behavior of state-ofthe-art base station architectures, such as multi-carrier active antenna units (AAUs), as well as the impact of different network parameters. In this paper, we present a power consumption model for 5G AAUs based on artificial neural networks. We demonstrate that this model achieves good estimation performance, and it is able to capture the benefits of energy saving when dealing with the complexity of multi-carrier base stations architectures. Importantly, multiple experiments are carried out to show the advantage of designing a general model able to capture the power consumption behaviors of different types of AAUs. Finally, we provide an analysis of the model scalability and the training data requirements.
translated by 谷歌翻译
In this paper, we propose a novel primal-dual proximal splitting algorithm (PD-PSA), named BALPA, for the composite optimization problem with equality constraints, where the loss function consists of a smooth term and a nonsmooth term composed with a linear mapping. In BALPA, the dual update is designed as a proximal point for a time-varying quadratic function, which balances the implementation of primal and dual update and retains the proximity-induced feature of classic PD-PSAs. In addition, by this balance, BALPA eliminates the inefficiency of classic PD-PSAs for composite optimization problems in which the Euclidean norm of the linear mapping or the equality constraint mapping is large. Therefore, BALPA not only inherits the advantages of simple structure and easy implementation of classic PD-PSAs but also ensures a fast convergence when these norms are large. Moreover, we propose a stochastic version of BALPA (S-BALPA) and apply the developed BALPA to distributed optimization to devise a new distributed optimization algorithm. Furthermore, a comprehensive convergence analysis for BALPA and S-BALPA is conducted, respectively. Finally, numerical experiments demonstrate the efficiency of the proposed algorithms.
translated by 谷歌翻译
移动网络第五代(5G)的能源消耗是电信行业的主要关注点之一。但是,目前没有一种评估5G基站(BSS)功耗的准确且可进行的方法。在本文中,我们提出了一个新颖的模型,以实现5G多载波BSS功耗的现实表征,该模型以大型数据收集活动为基础。首先,我们定义了允许对多个5G BS产品进行建模的机器学习体系结构。然后,我们利用该框架收集的知识来得出一个现实且可分析的功耗模型,这可以帮助推动理论分析以及功能标准化,开发和优化框架。值得注意的是,我们证明了这种模型具有很高的精度,并且能够捕获节能机制的好处。我们认为,该分析模型是理解5G BSS功耗的基本工具,并准确地优化了网络能源效率。
translated by 谷歌翻译
本文提出了一种针对分布式凸复合优化问题的新型双重不精确拆分算法(DISA),其中本地损耗函数由$ L $ -SMOOTH的项组成,可能是由线性操作员组成的非平滑项。我们证明,当原始和双重尺寸$ \ tau $,$ \ beta $满足$ 0 <\ tau <{2}/{l} $和$ 0 <\ tau \ beta <1 $时,我们证明了DISA是收敛的。与现有的原始双侧近端分裂算法(PD-PSA)相比,DISA克服了收敛步骤范围对线性操作员欧几里得范围的依赖性。这意味着当欧几里得规范大时,DISA允许更大的步骤尺寸,从而确保其快速收敛。此外,我们分别在一般凸度和度量次级性下分别建立了disa的均值和线性收敛速率。此外,还提供了DISA的近似迭代版本,并证明了该近似版本的全局收敛性和sublinear收敛速率。最后,数值实验不仅证实了理论分析,而且还表明,与现有的PD-PSA相比,DISA达到了显着的加速度。
translated by 谷歌翻译
树木修剪过程是促进水果生长并改善其生产的关键,这是由于对分支机构水果和营养运输的光合作用效率的影响。目前,修剪仍然高度依赖人类劳动。工人的经验将强烈影响树修剪性能的稳健性。因此,对于工人和农民来说,评估修剪性能是一个挑战。本文旨在为了更好地解决该问题,提出了一种新型的修剪分类策略模型,称为“ OTSU-SVM”,以根据分支和叶子的阴影评估修剪性能。该模型不仅考虑了树的可用照明区域,还考虑了树的照明区域的均匀性。更重要的是,我们的小组将OTSU算法实现到该模型中,该算法高度增强了该模型评估的鲁棒性。此外,实验中还使用了来自Yuhang区的梨树的数据。在该实验中,我们证明了OTSU-SVM具有良好的精度,在评估梨树的修剪时具有80%的性能和高性能。如果应用于果园,它可以提供更成功的修剪。成功的修剪可以扩大单个水果的照明区域,并增加目标分支的营养运输,从而显着提高水果的重量和生产。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译